Shrinkage-based similarity metric for cluster analysis of microarray data.

نویسندگان

Vera Cherepinsky

Jiawu Feng

Marc Rejali

Bud Mishra

چکیده

The current standard correlation coefficient used in the analysis of microarray data was introduced by M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein [(1998) Proc. Natl. Acad. Sci. USA 95, 14863-14868]. Its formulation is rather arbitrary. We give a mathematically rigorous correlation coefficient of two data vectors based on James-Stein shrinkage estimators. We use the assumptions described by Eisen et al., also using the fact that the data can be treated as transformed into normal distributions. While Eisen et al. use zero as an estimator for the expression vector mean mu, we start with the assumption that for each gene, mu is itself a zero-mean normal random variable [with a priori distribution N(0,tau 2)], and use Bayesian analysis to obtain a posteriori distribution of mu in terms of the data. The shrunk estimator for mu differs from the mean of the data vectors and ultimately leads to a statistically robust estimator for correlation coefficients. To evaluate the effectiveness of shrinkage, we conducted in silico experiments and also compared similarity metrics on a biological example by using the data set from Eisen et al. For the latter, we classified genes involved in the regulation of yeast cell-cycle functions by computing clusters based on various definitions of correlation coefficients and contrasting them against clusters based on the activators known in the literature. The estimated false positives and false negatives from this study indicate that using the shrinkage metric improves the accuracy of the analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data

MOTIVATION Because co-expressed genes are likely to share the same biological function, cluster analysis of gene expression profiles has been applied for gene function discovery. Most existing clustering methods ignore known gene functions in the process of clustering. RESULTS To take advantage of accumulating gene functional annotations, we propose incorporating known gene functions into a n...

متن کامل

Composite Kernel Optimization in Semi-Supervised Metric

Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...

متن کامل

Biweight Correlation as a Measure of Distance between Genes on a Microarray Abstract: The underlying goal of microarray experiments is to identify genetic patterns

The underlying goal of microarray experiments is to identify genetic patterns across different experimental conditions. Genes contained in a particular pathway or that respond similarly to experimental conditions should be coregulated and show similar patterns of expression on a microarray. Using any of a variety of clustering methods or gene network analyses, we can partition genes of interest...

متن کامل

Improved Estimation of Correlation in Microarray Data Analysis

In the original work on clustering due to Eisen et al., in which they performed one of the most highly-re-analyzed microarray dataset of gene expressions, the authors claimed to have “found in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.” However, they measured similarity between any pair of genes us...

متن کامل

Biweight Correlation as a Measure of Distance between Genes on a Microarray

Motivation: The underlying goal of microarray experiments is to identify genetic patterns across different experimental conditions. Genes that are contained in a particular pathway or that respond similarly to experimental conditions should be co-expressed and show similar patterns of expression on a microarray. Using any of a variety of clustering methods or gene network analyses we can partit...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Proceedings of the National Academy of Sciences of the United States of America

دوره 100 17 شماره

صفحات -

تاریخ انتشار 2003

Shrinkage-based similarity metric for cluster analysis of microarray data.

نویسندگان

چکیده

منابع مشابه

Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data

Composite Kernel Optimization in Semi-Supervised Metric

Biweight Correlation as a Measure of Distance between Genes on a Microarray Abstract: The underlying goal of microarray experiments is to identify genetic patterns

Improved Estimation of Correlation in Microarray Data Analysis

Biweight Correlation as a Measure of Distance between Genes on a Microarray

عنوان ژورنال:

اشتراک گذاری